-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] TPU receiver prometheus metrics #76
base: prometheus
Are you sure you want to change the base?
Conversation
let max_channel_len = self.packets_count.swap(0, Ordering::Relaxed); | ||
|
||
{ | ||
let mut stats = self.total_stats.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We accumulate these values now 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are packet_batches_count
, full_packet_batches_count
and max_channel_len
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is all kind of messy, but from what I understand:
packets_count
here is just number of UDP packets received specifically on TPU sockets.packet_batches_count
is number of "batches" received - Solana will read packets from the socket in batches of up to 64 packets.full_packet_batches_count
- is number of batches that were full (so containing 64 packets).max_channel_len
- after reading from the socket packets are forwarded to unbounded channel. So this value is is max length of this channel (across all current measurements), measured each time before forwarding new packets. So this will basically be a max value over 1 second since then the stats are reset 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add those as the HELP
line in the exported metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will basically be a max value over 1 second since then the stats are reset
I don’t know how it measures this, but if the channel is backed by something similar to a Vec
(I think it might be a deque in practice, but the same applies), and the max measures the capacity of the vec/deque, then it would persist even if the metrics are reset, unless they shrink_to_fit
. But this is pure speculation, I haven’t looked into how it is implemented.
@@ -379,6 +380,7 @@ impl Validator { | |||
socket_addr_space: SocketAddrSpace, | |||
use_quic: bool, | |||
tpu_connection_pool_size: usize, | |||
prometheus_collector: Option<PrometheusCollector>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it makes sense to put new arguments at the end of the function parameters, to avoid conflicts, it's better to put new arguments in the middle of the other arguments, that makes rebasing easier if something changes in the arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty clean and not very invasive, which is a good thing 👍 I wonder though what would happen if we turn it around, instead of making StreamerReceiveStats
have a references to metrics and push to it, could the Prometheus config have a reference to StreamerReceiveStats
and read from it? I’m can’t predict what upstream Solana people will say, but so far it looks like they are very concerned about this slowing down the validator. (Though a single branch is negligible either way compared to all the string formatting for the logs ...) I genuinely don’t know how it would turn out, but it may worth trying to see which approach is cleaner.
Also it would be nice if we can avoid the polling/pushing delay, but I don’t know if if that will make it much more complex. Hooking the report
function is definitely tempting because it makes it a small and self-contained change.
let stats = { | ||
tpu_stats.total_stats.lock().unwrap().clone() | ||
}; | ||
collector.lock().unwrap().save_tpu_receiver_stats(stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this will keep one copy of the metrics in with the Prometheus config, and periodically “push” to it, and then the endpoint serves whatever the latest value is that it had.
This creates a polling delay, the value that we serve on /metrics
can be outdated by the cycle interval of this loop. It looks like that internal is 1 second, so it’s not too bad, but it would be nicer if we can get the actual current values. If we can get the StreamerReceiveStats
into the Prometheus endpoint, it could do the atomic reads there, but it means we would need to make them totals. We can handle the resetting here at this point then.
Something like this:
struct GoatCounter<T> {
num_goats_teleported: T,
num_goats_received: T,
}
type GoatCounterCurrent = GoatCounter<AtomicU64>;
type GoatCounterSnapshot = GoatCounter<u64>;
impl GoatCounterCurrent {
pub fn snapshot(&self) -> GoatCounterSnapshot {
GoatCounterSnapshot {
num_goats_teleported: self.num_goats_teleported.load(Ordering::Relaxed),
num_goats_received: self.num_goats_received.load(Ordering::Relaxed),
}
}
}
impl std::ops::Sub for GoatCounterSnapshot {
fn sub(self, rhs: Self) -> Self {
Self {
num_goats_teleported: self.num_goats_teleported - rhs.num_goats_teleported,
num_goats_received: self.num_goats_received - rhs.num_goats_received,
}
}
}
// In the place where we print the metrics:
let mut last_logged: GoatCounterSnapshot = todo!();
let current = goat_counter.snapshot();
log(current - last_logged);
last_logged = current;
// In the Prometheus endpoint:
let current = goat_counter.snapshot();
write_metrics(¤t, ...);
Not sure if this is feasible in practice though, it looks like there is something going on with accumulation over multiple threads? Also, having the separate fields as atomic ints can lead to surprising results. For example, if you are trying to compute an error rate, and you have counters num_requests
and num_errors
, where it should be the case that num_errors <= num_requests
, then if we load those counters with Ordering::Relaxed
, that inequality could be violated. It requires some thought both at the place where the counters are read and the place where they are incremented to get it right, I’m not sure Solana does this, since they prioritize speed over correctness, and the Ordering::Relaxed
does give the compiler and CPU more freedom to optimize than some of the other orderings ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see, so that would basically result in changing how the logic of current Solana metrics are implemented. I would be a bit hesitant to change this Solana metrics logic because to make it consistent we would likely need to do it in all places this pattern is followed (which seems to be a lot).
Not sure if this is feasible in practice though, it looks like there is something going on with accumulation over multiple threads?
Yes, that is what is happening.
Also, having the separate fields as atomic ints can lead to surprising results. For example, if you are trying to compute an error rate, and you have counters num_requests and num_errors
Yeah there are definitely possible inconsistencies, but I guess that is really common in metrics. Using the example of num_requests
and num_errors
in some other application, num_requests
is likely to be incremented at the beginning of handling the request while num_errors
for obvious reasons can only be incremented later, so it won't be atomic operation.
P.S. I now want to see this whole application for teleporting goats 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_requests is likely to be incremented at the beginning of handling the request while num_errors for obvious reasons can only be incremented later, so it won't be atomic operation.
That by itself is not an issue, since you still couldn’t get a >100% error rate. If you increment the error counter before the request counter, then you could get a >100% error rate, which is really counter-intuitive. (Or if you read the request counter before the error counter.)
I now want to see this whole application for teleporting goats
You might enjoy these :-)
https://crbug.com/31482
https://codereview.chromium.org/543045
@Szymongib do you still plan to pick this up some time? It’s not high prio, but I think it would be nice to have. |
Hmm, I thought we do not really want to pursue it further, but if you think it is worth it I will add it back to my todolist. |
Summary of Changes
This is proof of concept on how we could get some other metrics to Prometheus.
This approach works and does not change the behaviour of existing solana metrics, but have some downsides, mainly:
_total
variables and update them correctly (perhaps we could do some of that with proc macros, building on @enriquefynn idea).This is a draft so we can discuss this or other possible approaches.